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Abstract 

Kernelization algorithms for the CLUSTER editing problem have been a popular topic in the recent 
research in parameterized computation. Thus far most kernelization algorithms for this problem are 
based on the concept of critical cliques. In this paper, we present new observations and new techniques 
for the study of kernelization algorithms for the CLUSTER editing problem. Our techniques are based 
on the study of the relationship between cluster editing and graph edge-cuts. As an application, we 
present an 0(n 2 )-time algorithm that constructs a 2k kernel for the weighted version of the cluster 
editing problem. Our result meets the best kernel size for the unweighted version for the cluster 
editing problem, and significantly improves the previous best kernel of quadratic size for the weighted 
version of the problem. 

1 Introduction 

Errors are ubiquitous in most experiments, and we have to find out the true information buried behind 
them, that is, to remove the inconsistences in data of experiment results. In most cases, we want to make 
the data consistent with the least amount of modifications, i.e., we assume the errors are not too much. 
This is an everyday problem in real life. Indeed, the problem has been studied by researchers in different 
areas [3J [35] . A graph theoretical formulation of the problem is called the CLUSTER editing problem that 
seeks a collection of edge insertion/deletion operations of minimum cost that transforms a given graph into a 
union of disjoint cliques. The CLUSTER editing problem has applications in many areas, including machine 
learning [3; , world wide web |12j , data-minning [4 , information retrieval [191 , and computational biology [10] . 
The problem is also closely related to another interesting and important problem in algorithmic research, 
clustering aggregation PP, which, given a set of clusterings on the same set of vertices, asks for a single 
clustering that agrees as much as possible with the input clusterings. 

Let G = (V, E) be an undirected graph, and let V 2 be the set of all unordered pairs of vertices in G (thus, 
for two vertices v and w, {v, w} and {w, v} will be regarded as the same pair). Let ir : V 2 i— > N U {+00} be 
a weight function, where N is the set of positive integers. The weight of an edge [v, w] in G is defined to be 
ir(v,w). If vertices v and w are not adjacent, and we add an edge between v and w, then we say that we 
insert an edge [v,w] of weight ir(v,w). 

The weighted cluster editing problem is formally defined as follows: 

(Weighted) cluster editing: Given (G, n, k), where G = (V, E) is an undirected graph, ir : 
V 2 i-> N U {+00} is a weight function, and k is an integer, is it possible to transform G into a 
union of disjoint cliques by edge deletions and/or edge insertions such that the weight sum of the 
inserted edges and deleted edges is bounded by kl 
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The problem is NP-complete even in its unweighted version |25j . Polynomial-time approximation al- 
gorithms for the problem have been studied. The best result is a randomized approximation algorithm of 
expected approximation ratio 3 by Ailon, Charikar, and Newman [T], which was later derandomized by van 
Zuylen and Williamson [27]. The problem has also been shown to be APX-hard by Charikar, Guruswami, 
and Wirth [8]. 

Recently, some researchers have turned their attention to exact solutions, and to the study of parame- 
terized algorithms for the problem. A closely related problem is to study kernelization algorithms for the 
problem, which, on an instance (G,ir,k) of cluster editing, produces an "equivalent" instance (G',7r, fc') 
such that k! < and that the kernel size (i.e., the number of vertices in the graph G") is small. For the 

unweighted version of the problem (i.e., assuming that for each pair v and w of vertices, tt(v, w) = 1), Gramm 
et al. [TTJ presented the first parameterized algorithm running in time 0(2.27 fe + n 3 ) and a kernelization 
algorithm that produces a kernel of 0{k 2 ) vertices. This result was immediately improved by a successive 
sequence of studies on kernelization algorithms that produce kernels of size 24fc [15_, of size 4k [TB] and of 
size 2k [9;. The 24fc kernel was obtained via crown reduction, while the later two results were both based on 
the concept of simple series module (critical clique), which is a restricted version of modular decomposition 
pTj . Basically, these algorithms iteratively construct the modular decomposition, find reducible simple series 
modules and apply reduction rules on them, until there are no any reducible modules found. 

For the weighted version, to our best knowledge, the only non-trivial result on kernelization is the 
quadratic kernel developed by Bocker et al. [7J. 

The main result of this paper is the following theorem: 

Theorem 1.1 There is an 0{n 2 )-time kernelization algorithm for the weighted CLUSTER EDITING problem 
that produces a kernel which contains at most 2k vertices. 

Compared to all previous results, Theorem 11.11 is better not only in kernel size and running time, but 
also more importantly in conceptual simplicity. 

A more general version of weighted cluster editing problem is defined with real weights, that is, the 
weight function ir is replaced by n' : V 2 H> M>i U {+00} where K>i is the set of all real numbers larger than 
or equal to 1, and correspondingly k becomes a positive real number. Our result also works for this version, 
in the same running time, and with only a small relaxation in the consant of kernel size. 

Our contribution. We report the first linear vertex kernel with very small constant, for the weighted 
version of the cluster editing problem. Our contribution to this research includes: 

1. the cutting lemmas (some of them are not used for our kernelization algorithm) are of potential use 
for future work on kernelizations and algorithms; 

2. both the idea and the process are very simple with efficient implementations that run in time 0(n 2 ). 
Indeed, we use only a single reduction rule, which works for both weighted and unweighted versions; 

3. the reduction processes to obtain the above results are independent of fc, and therefore are more general 
and applicable. 

2 Cutting Lemmas 

In this paper, graphs are always undirected and simple. A graph is a complete graph if each pair of vertices 
are connected by an edge. A clique in a graph G is a subgraph G' of G such that G' is a complete graph. By 
definition, a clique of h vertices contains (' 2 l ) = h(h — l)/2 edges. If two vertices v and w are not adjacent, 
then we say that the edge [v,w] is missing, and call the pair {v,w} an anti-edge. The total number of 
anti-edges in a graph of n vertices is n(n — l)/2 — |J5(G)|. The subgraph of the graph G induced by a vertex 
subset X is denoted by G[_X~]. 

Let G = (V,E) be a graph, and let S CV 2 . Denote by GAS the graph obtained from G as follows: for 
each pair {v,w} in S, if [v,w] is an edge in G, then remove the edge [v,w] in the graph, while if {v,w} is 
an anti-edge, then insert the edge [v, w] into the graph. A set S C V 2 is a solution to a graph G = (V, E) if 
the graph GAS is a union of disjoint cliques. 

/(•) is a computable function. 
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For an instance (G, n, k) of CLUSTER editing, where G = {V, E), the weight of a set S C V^ 2 is defined as 
7r ( , 5') = to}es 7r ( 11 ' w )- Similarly, for a set _E' of edges in G, the weight of £" is tt(E') — J2[ v w]<ee< n ( v > w )- 
Therefore, the instance (G, it, k) asks if there is a solution to G whose weight is bounded by k. 

For a vertex w, denote by N(v) the set of neighbors of v, and let N[v] — N(v) U {«}• For a vertex set 
X, N[X] = U^ex-^H' and iV ( X ) = For the vertex set X, define X = V\X. For two vertex 

subsets X and Y", denote by E(X, Y) the set of edges that has one end in X and the other end in Y. For a 
vertex subset X, the edge set E(X,X) is called the cut of X. The total cost of the cut of X is denoted by 
7pO = n(E(X, X)). Obviously, 7(X) = j(X). For an instance (G,7r, k) of the CLUSTER editing problem, 
denote by Ui(G) the weight of an optimal (i.e., minimum weighted) solution to the graph G. 

Behind all of the following lemmas is a very simple observation: in the objective graph GAS for any 
solution S to the graph G, each induced subgraph is also a union of disjoint cliques. Therefore, a solution S 
to the graph G restricted to an induced subgraph G' of G (i.e., the pairs of S in which both vertices are in 
G') is also a solution to the subgraph G' . This observation leads to the following Cutting Lemma. 

Lemma 2.1 Let V = {Vi, V2, . . . , V p } be a vertex partition of a graph G, and let E-p be the set of edges 
whose two ends belong to two different parts in V. Then 5Zi=i u(G[Vi\) < u>(G) < tt(E-p) + X)f=i W (C[^])- 

Proof. Let S be an optimal solution to the graph G. For 1 < i < p, let Si be the subset of S such that 
each pair in Si has both its vertices in Vi- As noted above, the set Si is a solution to the graph G[Vi], which 
imples w(G[Vi\) < tt{S 1 ). Thus, 

i=l i=l 

On the other hand, if we remove all edges in E-p, and for each i, apply an optimal solution S^ to the 
induced subgraph G[V^], we will obviously end up with a union of disjoint cliques. Therefore, these operations 
make a solution to the graph G whose weight is n(E-p) + Y^i=i ^(^i) — ^i^v) + Si=i W ( G [^])- This gives 
immediately w(G) < n(E v ) + Ef=i W ( G [^])- " □ 

Lemma [2TTI directly implies the following corollaries. First, if there is no edge between two different parts 
in the vertex partition V, then Lemma 1 2 . 1 1 gives 

Corollary 2.2 Let G be a graph with connected components G\, G p , then oj(G) = 'YT i= i^{Gi), and 
every optimal solution to the graph G is a union of optimal solutions to the subgraphs G\, . . ., G p . 

When p — 2, i.e., the vertex partition is V = {X, X}, the edge set E-p becomes the cut E(X,X), and 
n(E(X,X)) = j(X). Lemma Ogives 

Corollary 2.3 Let X C V be a vertex set, then u(G[X}) + u)(G{X}) < u)(G) < w(G[X]) + w(G[X]) +j(X). 

Corollary 2.4 Let G be a graph, and let S* be an optimal solution to G. For any subset X of vertices in 
G, if we let S*(X,X) be the subset of pairs in which one vertex is in X and the other vertex is in X, then 
7r(S*(X,X))<^X). 

Proof. The optimal solution S* can be divided into three disjoint parts: the subset S*(X) of pairs in 
which both vertices are in X, the subset S*(X) of pairs in which both vertices are in X, and the subset 
S*(X, X) of pairs in which one vertex is in X and the other vertex is in X. By Corollary 12.31 

lo{G) = tt(S*PQ) + n(S*(X)) + tt(S*(X,X)) < lj(G[X}) + uj{G[X}) + j(X). 

Since n(S*(X)) > w(G[X]) and n(S*(X)) > uj(G[X}), we get immediately n(S*(X,X)) < j(X). □ 

Corollary 12.41 can be informally described as "cut preferred" principle, which is fundamental for this 
problem. Similarly we have the following lemmas. 
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Lemma 2.5 Let X be a subset of vertices in a graph G, and let S* be any optimal solution to G. Let 
S*(V, X) be the set of pairs in S* in which at least one vertex is in X . Then w(G) > uj(G[X]) + n(S*(V, X)). 

Proof. The optimal solution S* is divided into two disjoint parts: the subset S*(X) of pairs in which 
both vertices are in X, and the subset S*(V, X) of pairs in which at least one vertex is in X. The set S*(X) 
is a solution to the induced subgraph G[A]. Therefore, n(S*(X)) > cu(G[X}). This gives 

lu(G) = ir(S*) = tt(S*(X)) + n(S*(V,X) > lu{G[X]) + n(S*{V,X)), 

which proves the lemma. □ 

Lemma 2.6 Let X be a subset of vertices in a graph G, and let Bx be the set of vertices in X that are 
adjacent to vertices in X . Then for any optimal solution S* to G, if we let S*{Bx) be the set of pairs in S* 
in which both vertices are in Bx, then uj{G) + n(S*(Bx)) > u(G[X]) + uj{G[X U Bx])- 

PROOF. Again, the optimal solution S* can be divided into three disjoint parts: the subset S*(X) of pairs 
in which both vertices are in X, the subset S*(X) of pairs in which both vertices are in X, and the subset 
S*(X, X) of pairs in which one vertex is in X and the other vertex is in X. We also denote by S*(Bx,X) 
the subset of pairs in S* in which one vertex is in Bx and the other vertex is in X. Since S*(X) is a solution 
to the induced subgraph G[X], we have 

u(G)+ir(S*(B x )) = i r (S*{X))+w(S*^))+w(S*(X 1 X))+w(S*(B x )) 

> oj(G[X}) + ir(S*(X)) + n(S*(X,X)) + n(S*(B x )) 

> u(G[X]) + ir(S*(X)) + n(S*(B x ,X)) + ir(S*(B x )). 

The last inequality is because B x C X , so S* (B x , X) C S* (X, X) . Since S' = S* (X) U S* (B x , X) U S* (B x ) 
is the subset of pairs in S* in which both vertices are in the induced subgraph G[X U Bx], S' is a solution 
to the induced subgraph G[X U Bx]- This gives 

tt(S') = n(S*(X)) + tt(S*(B x ,X)) + ir(S*(B x )) > u{G\X\JB x ]), 

which implies the lemma immediately. D 

The above results that reveal the relations between the structures of the cluster editing problem and 
graph edge cuts not only form the basis for our kernelization results presented in the current paper, but also 
are of their own importance and interests. 

3 The kernelization algorithm 

Obviously, the number of different vertices included in a solution £ of k vertex pairs to a graph G is upper 
bounded by 2k. Thus, if we can also bound the number of vertices that are not included in S, we get a 
kernel. For such a vertex v, the clique containing v in GAS must be G[7V[u]]. Inspired by this, our approach 
is to check the closed neighborhood N[v] for each vertex v. 

The observation is that if an induced subgraph (e.g. the closed neighborhood of a vertex) is very "dense 
inherently", while is also "loosely connected to outside", (i.e. there are very few edges in the cut of this 
subgraph), it might be cut off and solved separately. By the cutting lemmas, the size of a solution obtained as 
such should not be too far away from that of an optimal solution. Actually, we will figure out the conditions 
under which they are equal. 

The subgraph we are considering is N[v] for some vertex v. For the connection of N[v] to outside, a good 
measurement is j(N[v]). Thus, here we only need to define the density. A simple fact is that the fewer edges 
missing, the denser the subgraph is. Therefore, to measure the density of N[v], we define the deficiency S(v) 
of N[v] as the total weight of anti-edges in G[iV[u]], which is formally given by S(v) — n({{x,y} \ x,y E 
N(v),[x,y]?E}). 
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Suppose that N[v] forms a single clique with no other vertices in the resulting graph GAS. Then anti- 
edges of total weight 6(v) have to be added to make N[v] a clique, and edges of total weight j(N[v]) have to be 
deleted to make N[v] disjoint. Based on this we define the stable cost of a vertex v as p(v) = 26 (v) + ~f(N[v]), 
and we say N[v] is reducible if p(v) < \N[v]\. 

Lemma 3.1 For any vertex v such that N[v] is reducible, there is an optimal solution S* to G such that 
the vertex set N[v] is entirely contained in a single clique in the graph GAS* . 

Proof. Let S be an optimal solution to the graph G, and pick any vertex v such that N[v] is reducible, 
i.e., p(v) < \N[v]\. Suppose that N[v] is not entirely contained in a single clique in GAS, i.e., N[v] = XUY, 
where X ^ and Y ^ 0, such that Y is entirely contained in a clique G\ in GAS* while X n C\ = (note 
that we do not assume that X is in a single clique in GAS). 

Inserting all missing edges between vertices in N[v] will transform the induced subgraph G[./V[u]] into a 
clique. Therefore, w(G[iV[u]]) < 6(v). Combining this with Corollary 12.31 we get 

w(G) < u(G[N[v]}) + uj(G[N\v\]) + j(N[v]) 

< 6(v)+lj(G[N\v\])+ 7 (N[v}) (1) 
= uj(G[N[v}}) + p(v)-5(v). 

Let S(V,N[v]) be the set of pairs in the solution S in which at least one vertex is in N[v], and let 
S(X, Y) be the set of pairs in S in which one vertex is in X and the other vertex is in Y. Also, let P(X, Y) 
be the set of all pairs (x,y) such that x e X and y 6 Y. Obviously, n(S(V, N[v])) > n(S(X,Y)) because 
X C V and Y C N[v]. Moreover, since the solution S places the sets X and Y in different cliques, S must 
delete all edges between X and Y. Therefore S(X, Y) is exactly the set of edges in G in which one end 
is in X and the other end is in Y. Also, by the definition of 6(v) and because both X and Y are subsets 
of N[v], the sum of the weights of all anti-edges between X and Y is bounded by 6(v). Thus, we have 
ir(S(X, Y)) + 6(v) > ir(P(X, Y)). Now by LemmadH 

w(G) > u(G[N\v\]) + ir(S(V,N[v])) 

> u(G[NM) + *{S{X,Y)) (2) 

> uj(G[N[v}]) + tt(P{X,Y))-6(v). 

Combining (fTJ and ([2]), and noting that the weight of each vertex pair is at least 1, we get 

\X\\Y\ < n(P(X,Y)) < p{v) < \N[v]\ = \X\ + \Y\. (3) 

This can hold true only when \X\ = 1 or \Y\ — 1. In both cases, we have |A| • \Y\ = \X\ + \Y\ — 1. Combining 
this with and noting that all the quantities are integers, we must have 

t:(P(X,Y)) = p(v), 

which, when combined with (fTJ) and ([2]), gives 

w(G) = w(G[AR]) + p(v) - 6(v) = lj(G[N\v}}) + j(N[v}) + 6(v). (4) 

Note that r y(N[v])+6(v) is the minimum cost to insert edges into and delete edges from the graph G to make 
N[v] a disjoint clique. Therefore, Equality ((4]) shows that if we first apply edge insert/delete operations of 
minimum weight to make N[v] a disjoint clique, then apply an optimal solution to the induced subgraph 
G[iV[u]], then we have an optimal solution S* to the graph G. This completes the proof of the lemma because 
the optimal solution S* has the vertex set N[v] entirely contained in a single clique in the graph GAS*. D 

Based on Lemma 13.11 we have the following reduction rule: 

Step 1 For a vertex v such that N[v] is reducible, insert edges between anti-edges in G[N[v]] to make G[N[v]] 
a clique, and decrease k accordingly. 
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After Step[TJ the induced subgraph G [AT [■?;]] becomes a clique with S(v) — and p(v) — j(N[v]). Now 
we use the following rule to remove the vertices in N(N[v]) that are loosely connected to N[v] (recall that 
N(N[v]) is the set of vertices that are not in N[v] but adjacent to some vertices in N(v), and that for two 
vertex subsets X and Y, E(X, Y) denotes the set of edges that has one end in X and the other end in Y). 

Step 2 Let v be a vertex such that N[v] is reducible on which Step[l]has been applied. For each vertex x in 
N(N[v]), if 7t(E(x, N(v))) < \N[v]\/2, then delete all edges in E(x, N(v)) and decrease k accordingly. 

We say that a reduction step R is safe if after edge operations of cost cr by the step, we obtain a new 
graph G' such that the original graph G has a solution of weight bounded by k if and only if the new graph 
G' has a solution of weight bounded by k — cr. 

Lemma 3.2 Step^is safe. 



Proof. By Lemma l3~Tj there is an optimal solution S to the graph G such that N[v] is entirely contained 
in a single clique G in the graph GAS. We first prove, by contradiction, that the clique G containing N[v] 
in the graph GAS has at most one vertex in N[v]. Suppose that there are r vertices U\, . . . , u r in N[v] that 
are in G, where r > 2. For 1 < i < r, denote by Cj the total weight of all edges between Uj and N[v], and 
by c- the total weight of all pairs (both edges and anti-edges) between m and N[v]. Note that c- > \N[v]\ 
and YH=i °i — l(N[v]). Then in the optimal solution S to G, the total weight of the edges inserted between 
N[v] and N[v] is at least 

tt r 

E^- c *) > E(i JV Mi- c *) = r i iV Mi-E c ' 

2 — 1 i—1 2 — 1 

> r\N[v]\ - i(N[v]) > 2\N[v]\ - j(N[v]) 

> 2\N[v]\-\N[v]\ = \N[v]\>f(N[v]), 

where we have used the fact \N[v}\ > j(N[v]) (this is because by the conditions of the step, p(v) = 25{v) + 
~/(N[v\) <\N[v]\). But this contradicts Corollary |2~U 

Therefore, there is at most one vertex x in N(N[v]) that is in the clique G containing N[v] in the graph 
GAS. Such a vertex x must satisfy the condition ir(E(x, N(v))) > |A r [u]|/2: otherwise deleting all edges in 
E(x, N(v)) would result in a solution that is at least as good as the one that inserts all missing edges between 
x and N[v] and makes N[v] U {x} a clique. Thus, for a vertex x in N(N[v]) with tt(E(x, N(v))) < \N[v]\/2, 
we can always assume that x is not in the clique containing N[v] in the graph GAS. In consequence, deleting 
all edges in E(x, N(v)) for such a vertex x is safe. D 

The structure of N[v] changes after the above steps. The result can be in two possible cases: (1) no 
vertex in N(N[v]) survives, and N[v) becomes an isolated clique - then by Corollary 12.21 we can simply 
delete the clique; and (2) there is one vertex x remaining in N(N[v}) (note that there cannot be more than 
one vertices surviving - otherwise it would contradict the assumption j(N[v]) < p(v) < \N[v]\). In case (2), 
the vertex set N[v] can be divided into two parts X = N[v] fl N(x) and Y = N[v]\X. From the proofs 
of the above lemmas, we are left with only two options: either disconnecting X from x with edge cost ex, 
or connecting Y and x with edge cost cy. Obviously ex > cy . Since both options can be regarded as 
connection or disconnection between the vertex set N[v] and the vertex x, we can further reduce the graph 
using the following reduction step: 

Step 3 Let v be a vertex such that N[v] is reducible on which Steps\l\ and\^ have been applied. If there still 
exists a vertex x in N(N[v]), then merge N[v] into a single vertex v' , connect v' to x with weight cx — cy , 
set weight of each anti-edge between v' and other vertex to +oo, and decrease k by cy. 

The correctness of this step immediately follows from above argument. 

Note that the conditions for all the above steps are only checked once. If they are satisfied, we apply all 
three steps one by one, or else we do nothing at all. So they are actually the parts of a single reduction rule 
presented as follows: 
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The Rule. Let v be a vertex satisfying 26 (v) + 7(A r [w]) < |iV[u]|, then: 

1. add edges to make Gr[JV[u]] a clique and decrease k accordingly; 

2. for each vertex x in N(N[v]) with tt(E(x, N[v])) < |7V[t;]|/2, remove all edges in E(x, N[v]) and decrease 
k accordingly; 

3. if a vertex x in N(N[v]) survives, merge N[v] into a single vertex (as described above) and decrease k 
accordingly. 

Now the following lemma implies Theorem 11.11 directly. 

Lemma 3.3 If an instance of the weighted cluster editing problem reduced by our reduction rule has 
more than 2k vertices, it has no solution of weight < k. 

Proof. We divide the cost of inserting/deleting a pair {u,v} into two halves and assign them to u and v 
equally. Thereafter we count the costs on all vertices. 

For any two vertices with distance 2, at most one of them is not shown in a solution S: otherwise they 
would have to belong to the same clique in GAS because of their common neighbors but the edge between 
them is missing. Thus, if we let {v\, V2, ■ ■ ■ , v r } be the vertices not shown in 5, then each two of their closed 
neighbors {AT[ui], iV^], . . . , -/V[v r ]} are either the same (when they are in the same simple series module) or 
mutually disjoint. The cost in each N[vi] is 6{vi) + 'y(N[v i \)/2 = p(vi)/2, which is at least |AT[i;j]|/2, because 
by our reduction rule, in the reduced instance we have p(v) > |iV"[i;]| for each vertex v. Each of the vertices 
not in any of N[vi] is contained in at least one pair of S and therefore bears cost at least 1/2. Summing 
them up, we get a lower bound for the total cost at least |V|/2. Thus, if the solution S has a weight bounded 
by k, then k > \V\/2, i.e., the graph has at most 2k vertices. □ 

4 On unweighted and real-weighted versions 

We now show how to adapt the algorithm in the previous section to support unweighted and real-weighted 
versions. Only slight modifications are required. Therefore, the proof of the correctness of them is omitted 
for the lack of space. 

Unweighted version. The kernelization algorithm presented does not work for unweighted version. The 
trouble arises in Step [31 where merging N[v] is not a valid operation in an unweighted graph. Fortunately, 
this can be easily circumvented, by replacing Step [3] by the following new rule: 

Step 3 (U) Let v be a vertex such that N[v] is reducible on which Steps\]\ and\M have been applied. If there 
still exists a vertex x in N(N[v}), then replace N[v] by a complete subgraph K\x\-\y\) an d connect x to all 
vertices of this subgraph. 

The correctness of this new rule is similar to the arguments in last section, and it is easy to check the 
first two rules apply for the unweighted version. Moreover, the proof of Lemma 13.31 can be easily adapted 
with the new rule. 

Real- weighted version. There are even more troubles when weights are allowed to be real numbers, 
instead of only positive integers. The first problem is that, without the integrality, ((3j cannot imply (|4|). This 
is fixable by changing the definition of reducible closed neighborhood from p(v) < \N(v) \ to p{v) < \N(v)\ — 1 
(they are equivalent for integers), then © becomes 

\X\\Y\ < n(P(X,Y)) < P(v) < \N[v}\ - 1 = \X\ + \Y\ - 1. (5) 

Formulated on reducible closed neighborhood, Steps [T] and [2] remain the same. 

The second problem is Step [3J in which wc need to maintain the validity of weights. Recall that we 
demand all weights be at least 1 for weight functions. This, although trivially holds for integral weight 
functions, will be problematic for real weight functions. More specifically, in Step[3l the edge [x,v'] could 
be assigned a weight cx — cy < 1 when cx and cy differ by less than 1. This can be fixed with an extension 
of StepH 
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Step 3 (R) Let v be a vertex such that N[v] is reducible and that on which Steps^\ and\^have been applied. 
If there still exists a vertex x in N(N[v]), then 

• if cx — cy > lj merge N[v] into a single vertex v' , connect v' to x with weight cx — cy , set weight of 
each anti-edge between v' and other vertex to +00, and decrease k by cy; 

• if cx — cy < 1, merge N[v] into two vertices v' and v" , connect v' to x with weight 2, and v' to v" with 
weight 2 — (cx — cy), set weight of each anti-edge between v',v" to other vertex to +oo, and decrease 
k by c x - 2. 

The new case is just to maintain the validity of the weight, and does not make a real difference from the 
original case. However, there does exist one subtlety we need to point out, that is, the second case might 
increase k slightly, and this happens when cx — 2 < 0, then we are actually increase k by 2 — cx- We do 
not worry about this trouble due to both theoretical and practical reasons. Theoretically, the definition 
of kernelization does not forbid increasing k, and we refer readers who feel uncomfortable with this to the 
monographs [13l US [24] . Practically, 1) it will not really enlarge or complicate the graph, and therefore any 
reasonable algorithms will work as the same; 2) this case will not happen too much, otherwise the graph 
should be very similar to a star, and easy to solve; 3) even using the original value of fc, our kernel size is 
bounded by 3fc. 

The proof of Lemma 13.31 goes almost the same, with only the constant slightly enlarged. Due to the 
relaxation of the condition of reducible closed neighborhood from p(v) < \N(y)\ to p(v) < \N(v)\ — 1, the 
number of vertices in the kernel for real-weighted version is bounded by 2.5k. 

5 Discussion 

One very interesting observation is that for the unweighted version, by the definition of simple series modules, 
all of the following are exactly the same: 

N[u] = N[M], S(u) = S(M), and j(N[u]) = j(N[M]), 

where M is the simple series module containing vertex u, and 8(M) is a natural generalization of definition 
S(v). Thus it does not matter we use the module or any vertex in it, that is, every vertex is a full representative 
for the simple series module it lies in. Although there has been a long list of linear algorithms for finding 
modular decomposition for an undirected graph (see a comprehensive survey by de Montgolher [23jh it 
is very time-comsuming because the big constant hidden behind the big-0 [26], and considering that the 
modular decopmosition needs to be re-constructed after each iteration, this will be helpful. It is somehow 
surprising that the previous kernelization algorithms can be significantly simplified by avoiding modular 
decomposition. Being more suprising, this enables our approach to apply for the weighted version, because 
one major weakness of modular decomposition is its inability in handling weights. 

One similar problem on inconsistant information is the feedback vertex set on TOURNAMENT (fast) 
problem, which asks the reverse of minimum number of arcs to make a tournament transtive. Given the 
striking resemblances between CLUSTER editing and fast, and a series of "one-stone-two-birds" approxi- 
mation algorithms [1] 127) which only take advantage of the commonalities between them, we are strongly 
attempted to compare the results of these two problems from the parameterized aspect. 

For the kernelization, our result already matches the best kernel, (2 + e)k for weighted fast of Bessy et 
al. [5], which is obtained based on a complicated PTAS [21] , 

For the algorithms, Alon et al. _2 managed to generalize the famous color coding approach to give a 
subexponential FPT algorithm for fast. This is the first subexponential FPT algorithm out of bidimen- 
sionality theory, which was a systematic way to obtain subexponential algorithms, and has been intensively 
studied. This is an exciting work, and opens a new direction for further work. Indeed, immediately after the 
appearance of [2], for unweighted version, Feige reported an improved algorithm [14 that is far simpler and 
uses pure combinatorial approach. Recently, Karpinski and Schudy reached the same result for weighted ver- 
sion |20j . Based on their striking resemblances, we conjecture that there is also a subexponential algorithm 
for the cluster editing problem. 
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